library(ids)
library(tidyverse)
library(popdemo)Please do not share this document with the students. This is only for your use, either to get a sense of the data or as a crib sheet with information to pass on to students.
“Open” in open CMR means that the population is open to change, therefore we assume that animals can be born and die. We assume (read “we hope and pray”) that immigration and emigration are negligible and can be ignored.
Figuring out what the survival probability of a group of animals is, can be hugely valuable for conservation or management, e.g., if your population is crashing, is it because survival has taken a hit?
The issue is that to figure out survival, you need to take imperfect detection into account.
There are a variety of open CMR models. The one used in this practical is the “bog standard” one (at least to my mind). The Cormack-Jolly-Seber model (abbreivated to to CJS, named after the authors, though I often mistakenly call it the Colly-Jolly-Seber model).
Open CMR models are state-spaced models which allow the estimation of survival rates.
A state space model is merely one whereby an animals can “occupy” one of of multiple states. In open CMR models, these states are Alive or Dead.
Animals can transition between states (e.g., Sleeping to Moving to Foraging, as in Hidden Markov Chain models [HMMs], or Ocupied to Vacant to Occupied, as in occupancy models), but in open CMR models, the only transition can be from Alive to Dead.
Zombies do not exist, so an animal cannot go from Dead to Alive.
Open CMR models estimate survival probabilities and recapture probabilities.
Note that we rarely care about recapture probabilities, they are only included to deal with imperfect detection (though if you are designing a new type of trap, for example, you might care about recapture probabilities).
The reason why we only care about recapture probabilities, and not initial capture probabilities, is that you do not care about what the probability was for an animal to enter your trap initially. This does not help you determine if an animals survival. All it tells you is that that particular animal was alive, which you know already because it went into your trap.
Instead, you need to know the probability that it re-entered your trap (or you detected it having put a radio-collar on, or however you are doing “trapping”). With this information you can help tease apart if the animal was alive and observed, alive and unobserved, or dead and unobserved.
The two parameters are estimated together in two sub-models.
The state process model deals with survival, where, if the animal was alive (state = 1), then this is multiplied by the survival probability (assume 87%) to determine if the animal was likely to be alive the following year: \(1 \times 0.87 = 0.87\). If the animal is dead (state = 0), then this resolves to \(0 \times 0.87 = 0\).
The observation process model deals with whether or not the animal was recaptured, but this must be conditioned on whether the animal was alive or not. Dead animals have a hard time entering a trap. If the animal was likely to be alive, then the state is multiplied by the recapture probability. Assume the recapture probability is 32%, then if the animal was alive; \(1 \times 0.32 = 0.32\), while if it were dead; \(0 \times 0.32 = 0.32\). This is how open CMR models deal with animals who may or may not be alive, as well as observed or not (so long as it was alive to be observed). It also ensures that that dead animals cannot be captured.
As with closed CMR models, open CMR uses capture history (e.g., where 110 represented that an animal was capture on day 1 and day 2, but it was not captured on day 3).
Because we only care about recapture, if we have 20 years of capture histories, only 19 years are actually used to estimate recapture, as the first year would represent the initial capture probability. So for a CH of 110, the only information used for recapture probability is -10.
Similarly, because survival is the probability to successfully transition from one year to the next (and in doing so survive), if we have 20 years of data, we only have 19 transitions. For a CH of 110, the information used is the first transition, 11, and then the second transition 10.
This is why when you biuld models in MARK, your matrices will be shortened by one year for both recapture and survival probabilities.
Below is the code used to simulate the open CMR data as will be used by the students. Note that the code here is much more complex and involved than the closed CMR simulation, partly because the actual simulation is more complex but also because I am also not a particularly elegant coder. I have tried to comment as much as possible to make it relatively transparent, though there is one section where I can’t remember how I came up with the code. This section falls very much into the “I don’t know why it works but it does, so don’t touch it” (the section where I get the numerical age of the rabbits).
# Setting the seed keeps the randomness consistent
set.seed(666)
# How many years of data we will simulate
n_years <- 16
# Year that data collection started
start_year <- 2007
# Number of new individuals in each each age group captured each year
# Number drawn from a poission distribution (rpois)
# We generate n_years - 1 values (15 years of new captures)
# We capture on average 9 one year olds (lambda = 9)
# Values for lambda informed from a matrix model to get age structure based on survivals
marked_one <- rpois(n = n_years - 1, lambda = 9)
marked_two <- rpois(n = n_years - 1, lambda = 5)
marked_adult <- rpois(n = n_years - 1, lambda = 25)set.seed(666)
# One year old survival (in normal circumstance)
mean_phi_1 <- 0.49
# Two year old survival (in normal circumstances)
mean_phi_2 <- 0.64
# Adult survival (in normal circumstances)
mean_phi_adult <- 0.82
# survival of rabbits as they age
# Takes into account that animals grow up each year
phi_1 <- c(mean_phi_1, mean_phi_2, rep(mean_phi_adult, n_years - 3))
phi_2 <- c(mean_phi_2, rep(mean_phi_adult, n_years - 2))
phi_adult <- rep(mean_phi_adult, n_years - 1)Note that detection is kept constant. I could include some covariate to influence detection, but I’m inclined to leave it as is. Let me know if you have strong feelings on this.
set.seed(666)
# Mean recapture probability is 60% (note with open CMR, we do not care about initial capture)
p <- rep(0.6, n_years - 1)
# Create a matrix and plug in all of the 60% detection probabilities
P_1 <- matrix(rep(p, sum(marked_one)),
ncol = n_years - 1,
nrow = sum(marked_one),
byrow = TRUE)P_2 <- matrix(rep(p, sum(marked_two)),
ncol = n_years - 1,
nrow = sum(marked_two),
byrow = TRUE)P_A <- matrix(rep(p, sum(marked_adult)),
ncol = n_years - 1,
nrow = sum(marked_adult),
byrow = TRUE)There are three covariates that have an effect on rabbit survival.
The first is rabbit age. This was specified above, but acts as a contrast treatment. One year olds have the lowest, two-year olds have “medium” survival, and adults have the highest.
The second is mongoose presence, where mongoose presence lowers survival by 3.5%. This acts as another contrast treatment.
The final is distance from human building. This is a conceptually challenging covariate and can be easily misinterpreted by students. Human distance is a proxy variable (i.e., a variable which captures another variable). Here, human distance is meant to be a proxy for cat and dog proximity - both of which are a legitimate threat to the rabbits in real life. So part of the confusion comes from this being a proxy variable. The other part which causes confusion is that humans are getting closer each year - so a decreasing trend in human distance over time. However, the relationship with rabbit survival is positive. Keep in mind, however, that this means that the larger the value for human distance, the larger survival is. You want humans far away (because of their pets), but they are moving closer each year.
When we run this practical, make sure to go over this detail with the students and make sure they understand it. Last year, it caused a few of them to get confused. I choose to keep it in this year, because it requires genuine thought to make sense of it.
Mongoose presence is determine as \(logit(p_i) = 4 - 0.38 \times Year_i\), so probability of mongoose presence declines over the years. This reflects the control of mongoose in the site.
Human distance is a declining trend as well, based on the negative cumulative sum of the previous year and a trend (see the code).
Survival is generated according to:
\(\phi_i \sim binomial(p_i)\)
\(p_i = \beta_0 + \beta_12YearOld_i + \beta_2Adult_i + \beta_3Mongoose_i + \beta_4Human_i\)
where \(\phi_i\) is whether a rabbit survives or not (\(\phi\) = phi), \(p_i\) is the probability that the rabbit survives, \(\beta_0\) is the global intercept, which defaults to one year old survival here, \(\beta_1\) is the change in survival for two year olds, \(2YearOld_i\) is a dummy variable indicating if rabbit \(i\) is a two year old (1) or not (0), \(\beta_2\) is the change in survival for adults, \(Adult_i\) is a dummy variable for rabbit \(i\) being an adult, \(\beta_3\) is the effect of mongoose on rabbit \(i\), \(Mongoose_i\) is dummy variable indicating whether rabbit \(i\) was exposed to mongoose, \(\beta_4\) is the effect of human distance (and hence pets), and \(Human_i\) is the distance humans were from the site.
The values for the various \(\beta_k\) are included in the code below, though the exact values of these may alter very slightly with any updates.
set.seed(666)
# Mongoose predation
mongoose <- rbinom(n = n_years, size = 1, prob = plogis(4 + -0.38 * 1:n_years))
beta_mongoose <- -0.035
# Nearest human infrastructure (km)
trend <- 0.5 # Humans get 0.5 km closer every year
human <- 10 -cumsum(rnorm(n_years, 0.1) + trend) # Distance from the site
beta_human <- 0.006# Aiming for one year old survival of 0.52 at the lowest
# Define matrices with survival and recapture probabilities
# One year olds
PHI_1 <- matrix(0, # Fill the matrix with zeros initially
ncol = n_years - 1, # Number of years when rabbits could be recaptured
nrow = sum(marked_one)) # Total number of marked one year olds
# This line of code fills the matrix with the survival of yearlings in the current year
# Takes into account that each year there is a new batch of babies, who will then grow up
# The survival rates here are just the mean survival rates for that age (given one year olds grow into 2 year olds, etc.)
for (i in 1:length(marked_one)) {
PHI_1[(sum(marked_one[1:i]) - marked_one[i] + 1):sum(marked_one[1:i]), i:(n_years - 1)] <- matrix(rep(phi_1[1:(n_years - i)],
marked_one[i]),
ncol = n_years - i,
byrow = TRUE)
}
# Take the mean survival and adjust them according to mongoose presence and human distance
# The ifelse is ensuring that we do this only for rabbits alive
for(i in 1:ncol(PHI_1)){
PHI_1[,i] <- ifelse(PHI_1[,i] != 0, PHI_1[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_1[,i])
}# Aiming for two year old survival of 0.62 at the lowest
# Code the same as for one year olds
PHI_2 <- matrix(0,
ncol = n_years - 1,
nrow = sum(marked_two))
for (i in 1:length(marked_two)) {
PHI_2[(sum(marked_two[1:i]) - marked_two[i] + 1):sum(marked_two[1:i]), i:(n_years - 1)] <- matrix(rep(phi_2[1:(n_years - i)],
marked_two[i]),
ncol = n_years - i,
byrow = TRUE)
}
for(i in 1:ncol(PHI_2)){
PHI_2[,i] <- ifelse(PHI_2[,i] != 0, PHI_2[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_2[,i])
}# AIming for adult survival of 0.83 at the lowest
PHI_a <- matrix(rep(phi_adult, sum(marked_adult)),
ncol = n_years - 1,
nrow = sum(marked_adult),
byrow = TRUE)
for(i in 1:ncol(PHI_a)){
PHI_a[,i] <- ifelse(PHI_a[,i] != 0, PHI_a[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_a[,i])
}The base version of this code comes from Olivier Giminez, I think from his online Bayesian CMR course but I may be wrong.
simul.cjs <- function(PHI, P, marked) { # User must provide the survival and recapture matrices, as well as the number of marked animals
# How many years from the PHI matrix, plus 1 year (survival is a transition between 2 years)
n.occasions <- dim(PHI)[2] + 1
# Create an empty matrix to fill with the CH
CH <- matrix(0,
ncol = n.occasions,
nrow = sum(marked))
# Vector with numerical value of when an animal was caught
mark.occ <- rep(1:length(marked),
marked[1:length(marked)])
# fill the CH matrix
for(i in 1:sum(marked)) { # For each individual in turn
CH[i, mark.occ[i]] <- 1 # CH has a 1 assigned when it was caught
if(mark.occ[i] == n.occasions) next # If this is the final year of trapping, skip to the next individual
for(t in (mark.occ[i] + 1):n.occasions) { # For each year an animal could have been alive
sur <- rbinom(1, 1, PHI[i,t-1]) # See if it survives based on its survival for the proceeding year
if(sur == 0) break # If it died, then stop (don't keep trying to figure out if it survived in later years)
rp <- rbinom(1, 1, P[i, t-1]) # If it was alive, was it caught? Uses recapture prob to determine
if(rp == 1) CH[i,t] <- 1 # Assign a 1 in the capture history
} # close t loop
} # close i loop
return(CH) # Spit out the capture history
} # close functionUse the information from above to generate actual capture histories for the various rabbits.
set.seed(666)
# One year old CH
CH_1 <- simul.cjs(PHI_1, P_1, marked_one) # individuals marked as yearlings
# Two year old CH
CH_2 <- simul.cjs(PHI_2, P_2, marked_two) # individuals marked as sub-adults
# Adult CH
CH_A <- simul.cjs(PHI_a, P_A, marked_adult) # individuals marked as adults
# Bind the rows of each CH together
CH <- rbind(CH_1, CH_2, CH_A)
# Create vector with occasion of marking
# I.e., when was an animal first captured
get.first <- function(x) min(which(x!=0))
f_1 <- apply(CH_1, 1, get.first)
f_2 <- apply(CH_2, 1, get.first)
f_a <- apply(CH_A, 1, get.first)
# Create empty matrices X indicating age classes which will be filled later
x_1 <- matrix(NA, ncol = dim(CH_1)[2]-1, nrow = dim(CH_1)[1])
x_2 <- matrix(NA, ncol = dim(CH_2)[2]-1, nrow = dim(CH_2)[1])
x_a <- matrix(NA, ncol = dim(CH_A)[2]-1, nrow = dim(CH_A)[1])
# Yearling data
# This loop distinguishes between adult and non-adult
for(i in 1:nrow(CH_1)){
for(t in f_1[i]:(ncol(CH_1) - 1)){
x_1[i, t] <- 3
x_1[i, f_1[i]] <- 1
} #t
} #i
# This loop specifies of those non-adults, which were 1 year olds
for( c in 2:ncol(x_1)){
x_1[,c]<- ifelse(!is.na(x_1[, c - 1]) & x_1[, c - 1] == 1, 2, x_1[, c])
}
# Adult and non-adult (here, all non-adults are 2 year olds)
for(i in 1:nrow(CH_2)){
for(t in f_2[i]:(ncol(CH_2) - 1)){
x_2[i, t] <- 3
x_2[i, f_2[i]] <- 2
} #t
} #i
# Which were adults
for(i in 1:nrow(CH_A)){
for(t in f_a[i]:(ncol(CH_A)-1)){
x_a[i, t] <- 3
} #t
} #iThis is age when the rabbit was first captured, to be used in the .INP file as a group.
set.seed(666)
# Unique animal IDs
rabbit_names <- adjective_animal(n = nrow(CH))
# Starting with age
age <- rbind(x_1, x_2, x_a)
for(i in 1:ncol(age)) {
age[,i] <- ifelse(age[,i] == 1,
"One year old",
ifelse(age[,i] == 2,
"Two year old",
ifelse(age[,i] == 3,
"Adult",
age[,i])))
}
age <- as.data.frame(age)
age$ID <- rabbit_names
seq_names <- seq(start_year, start_year + n_years - 2, 1)
colnames(age)[1:length(seq_names)] <- seq_names
age <- age[, c(16, 1:15)]
# write.csv(age, "open_CMR_age_covariate_2022.csv", row.names = FALSE)This is constant for all rabbits. Mongoose were either present, and impacted all rabbits or they were not.
set.seed(666)
mongoose_df <- matrix(NA,
ncol = n_years,
nrow = nrow(CH))
for(i in 1:ncol(mongoose_df)){
mongoose_df[,i] <- mongoose[i]
}
mongoose_df <- as.data.frame(mongoose_df)
mongoose_df$ID <- rabbit_names
seq_names <- seq(start_year, start_year + n_years, 1)
colnames(mongoose_df)[1:length(seq_names)] <- seq_names
mongoose_df <- mongoose_df[, c(17, 1:16)]
# write.csv(mongoose_df, "open_CMR_mongoose_covariate.csv", row.names = FALSE)Humans getting closer to the site over time, lowering survival. Keep in mind this reflects predation by pets, and maybe the odd human every now and again…
human_df <- matrix(NA,
ncol = n_years,
nrow = nrow(CH))
for(i in 1:nrow(human_df)) {
human_df[i,] <- human
}
human_df <- as.data.frame(human_df)
human_df$ID <- rabbit_names
seq_names <- seq(start_year, start_year + n_years - 1, 1)
colnames(human_df)[1:length(seq_names)] <- seq_names
human_df <- human_df[, c(17, 1:16)]
# write.csv(human_df, "open_CMR_human_distance_covariate.csv", row.names = FALSE)CH_df <- as.data.frame(CH)
CH_df$ID <- rabbit_names
seq_names <- seq(start_year, start_year + n_years - 1, 1)
colnames(CH_df)[1:length(seq_names)] <- seq_names
CH_df <- CH_df[, c(17, 1:16)]
# write.csv(CH_df, "open_CMR_capture_history.csv", row.names = FALSE)The ribbons here are the survival rates I was targetting based on the matrix models. The upper part of the ribbon results in a stable population, the bottom of the ribbon results in a slowly declining population (still extant after 25 years).
df <- data.frame(
age = c(rep("One", 16), rep("Two", 16), rep("Adult", 16)),
year = c(2007:2022, 2007:2022, 2007:2022),
survival = c((mean_phi_1 + beta_mongoose * mongoose + beta_human * human),
(mean_phi_2 + beta_mongoose * mongoose + beta_human * human),
(mean_phi_adult + beta_mongoose * mongoose + beta_human * human)),
mongoose = c(mongoose, mongoose, mongoose),
human = c(human, human, human)
)
df$age <- factor(df$age, levels = c("One", "Two", "Adult"))
df$mean_phi <- c(rep(mean_phi_1, 16), rep(mean_phi_2, 16), rep(mean_phi_adult, 16))
df$low_phi <- c(rep(0.52, 16), rep(0.62, 16), rep(0.83, 16))
plotly::ggplotly(ggplot(df) +
geom_point(aes(x = year, y = survival, colour = age),
size = 2) +
geom_path(aes(x = year, y = survival, colour = age)) +
geom_ribbon(aes(x = year, y = survival, ymax = mean_phi, ymin = low_phi, fill = age), alpha = 0.2) +
scale_y_continuous(labels = scales::percent) +
labs(x = "Year",
y = "Survival",
colour = "Age",
fill = "Age") +
scale_colour_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set2") +
theme_bw() +
theme_minimal()
)A table summarising the range of true survivals over time. Variation is caused exclusively by age, mongoose presence, and human distance. These are not estimates, they are the actual values.
df$mongoose <- ifelse(df$mongoose == 1, "Mongoose Present", "Mongoose Absent")
df %>%
group_by(age) %>%
summarise(mean_survival = mean(survival),
min_survival = min(survival),
max_survival = max(survival))Figure below shows impact of all three covariates (age, mongoose presence, and distance to humans). Note here that increasing human distance is good, hence the parameter is positive. Be sure to explain to students that this is bad, because the raw data shows humans getting closer, not further away.
plotly::ggplotly(ggplot(df) +
geom_point(aes(x = human, y = survival, colour = age),
size = 2) +
geom_smooth(aes(x = human, y = survival, colour = age),
method = "lm") +
geom_ribbon(aes(x = human, y = survival, ymax = mean_phi, ymin = low_phi, fill = age), alpha = 0.2) +
scale_y_continuous(labels = scales::percent) +
facet_grid( ~ mongoose) +
labs(x = "Distance to humans",
y = "Survival",
colour = "Age",
fill = "Age") +
scale_colour_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set2") +
theme_bw() +
theme_minimal()
)The figure below is generated using matrix models. I assume a sex ratio of 50%, 1.1 offspring produced per year, and use the survival rates from the above table.
The average survival shows that the population is declining because both mongoose and humans negatively effect survival, and either one is generally “active” (i.e., something is always lowering survival). When humans are still far away and mongoose are not having an effect (the maximum survival), the survival rates lead to a stable population.
# Minimum survival
numbers_at_stage <- c(58, 33, 159)
phi_1 <- min(df$survival[df$age == "One"])
phi_2 <- min(df$survival[df$age == "Two"])
phi_a <- min(df$survival[df$age == "Adult"])
sigma <- 0.5
m_2 <- 1.1
m_a <- 1.1
F_2 <- m_2 * phi_1 * sigma
F_a <- m_a * phi_1 * sigma
rabbit <- as.matrix(data.frame(
one = c(0, phi_1, 0),
two = c(F_2, 0, phi_2),
adults = c(F_a, 0, phi_a)
))
mini <- project(A = rabbit, vector = numbers_at_stage, time = 25)
# mean survival
phi_1 <- mean(df$survival[df$age == "One"])
phi_2 <- mean(df$survival[df$age == "Two"])
phi_a <- mean(df$survival[df$age == "Adult"])
F_2 <- m_2 * phi_1 * sigma
F_a <- m_a * phi_1 * sigma
rabbit <- as.matrix(data.frame(
one = c(0, phi_1, 0),
two = c(F_2, 0, phi_2),
adults = c(F_a, 0, phi_a)
))
avg <- project(A = rabbit, vector = numbers_at_stage, time = 25)
# maximum survival
phi_1 <- max(df$survival[df$age == "One"])
phi_2 <- max(df$survival[df$age == "Two"])
phi_a <- max(df$survival[df$age == "Adult"])
F_2 <- m_2 * phi_1 * sigma
F_a <- m_a * phi_1 * sigma
rabbit <- as.matrix(data.frame(
one = c(0, phi_1, 0),
two = c(F_2, 0, phi_2),
adults = c(F_a, 0, phi_a)
))
maxi <- project(A = rabbit, vector = numbers_at_stage, time = 25)
df_min <- data.frame(
population = c(mini@.Data, avg@.Data, maxi@.Data),
years = rep(1:length(mini@.Data), 3),
model = c(rep("Minimum survival", length(mini@.Data)),
rep("Mean survival", length(mini@.Data)),
rep("Maximum survival", length(mini@.Data)))
)
plotly::ggplotly(ggplot(df_min) +
geom_point(aes(x = years, y = population, colour = model)) +
geom_path(aes(x = years, y = population, colour = model)) +
ylim(c(0, 275)) +
labs(y = "Population size",
x = "Years into future",
colour = "Variation in survival") +
scale_colour_brewer(palette = "Set2") +
theme_bw() +
theme_minimal()
)